5,367 research outputs found

    Enhancing the effectiveness of ligand-based virtual screening using data fusion

    Get PDF
    Data fusion is being increasingly used to combine the outputs of different types of sensor. This paper reviews the application of the approach to ligand-based virtual screening, where the sensors to be combined are functions that score molecules in a database on their likelihood of exhibiting some required biological activity. Much of the literature to date involves the combination of multiple similarity searches, although there is also increasing interest in the combination of multiple machine learning techniques. Both approaches are reviewed here, focusing on the extent to which fusion can improve the effectiveness of searching when compared with a single screening mechanism, and on the reasons that have been suggested for the observed performance enhancement

    The Evaluation Of Molecular Similarity And Molecular Diversity Methods Using Biological Activity Data

    Get PDF
    This paper reviews the techniques available for quantifying the effectiveness of methods for molecule similarity and molecular diversity, focusing in particular on similarity searching and on compound selection procedures. The evaluation criteria considered are based on biological activity data, both qualitative and quantitative, with rather different criteria needing to be used depending on the type of data available

    The Porter stemming algorithm: then and now

    Get PDF
    Purpose: In 1980, Porter presented a simple algorithm for stemming English language words. This paper summarises the main features of the algorithm, and highlights its role not just in modern information retrieval research, but also in a range of related subject domains. Design: Review of literature and research involving use of the Porter algorithm. Findings: The algorithm has been widely adopted and extended so that it has become the standard approach to word conflation for information retrieval in a wide range of languages. Value: The 1980 paper in Program by Porter describing his algorithm has been highly cited. This paper provides a context for the original paper as well as an overview of its subsequent use

    The Journal of Computer-Aided Molecular Design: a bibliometric note

    Get PDF
    Summarizes the articles in, and the citations to, volumes 2-24 of the Journal of Computer-Aided Molecular Design. The citations to the journal come from almost 2000 different sources that span a very wide range of academic subjects, with the most heavily cited articles being descriptions of software systems and of computational methods

    Brief communication: Gender differences in publication and citation counts in librarianship and information science research

    Get PDF
    An analysis is presented of the publications by, and citations to, 57 male and 48 female academics in five departments of librarianship and information science. After taking account of differences in subject and differences in numbers of academics, it is shown that male academics publish significantly more papers on average than do female authors, but that there is no significant difference in the numbers of citations to published papers

    ArticleRank: a PageRank-based alternative to numbers of citations for analysing citation networks

    Get PDF
    Purpose - The purpose of this paper is to suggest an alternative to the widely used Times Cited criterion for analysing citation networks. The approach involves taking account of the natures of the papers that cite a given paper, so as to differentiate between papers that attract the same number of citations. Design/methodology/approach - ArticleRank is an algorithm that has been derived from Google's PageRank algorithm to measure the influence of journal articles. ArticleRank is applied to two datasets - a citation network based on an early paper on webometrics, and a self-citation network based on the 19 most cited papers in the Journal of Documentation - using citation data taken from the Web of Knowledge database. Findings - ArticleRank values provide a different ranking of a set of papers from that provided by the corresponding Times Cited values, and overcomes the inability of the latter to differentiate between papers with the same numbers of citations. The difference in rankings between Times Cited and ArticleRank is greatest for the most heavily cited articles in a dataset. Originality/value - This is a novel application of the PageRank algorithm

    Maximum common subgraph isomorphism algorithms for the matching of chemical structures

    Get PDF
    The maximum common subgraph (MCS) problem has become increasingly important in those aspects of chemoinformatics that involve the matching of 2D or 3D chemical structures. This paper provides a classification and a review of the many MCS algorithms, both exact and approximate, that have been described in the literature, and makes recommendations regarding their applicability to typical chemoinformatics tasks

    Evaluation of the EVA Descriptor for QSAR Studies: 3. The use of a Genetic Algorithm to Search for Models with Enhanced Predictive Properties (EVA_GA)

    Get PDF
    The EVA structural descriptor, based upon calculated fundamental molecular vibrational frequencies, has proved to be an effective descriptor for both QSAR and database similarity calculations. The descriptor is sensitive to 3D structure but has an advantage over field-based 3D-QSAR methods inasmuch as structural superposition is not required. The original technique involves a standardisation method wherein uniform Gaussians of fixed standard deviation (σ) are used to smear out frequencies projected onto a linear scale. This smearing function permits the overlap of proximal frequencies and thence the extraction of a fixed dimensional descriptor regardless of the number and precise values of the frequencies. It is proposed here that there exist optimal localised values of σ in different spectral regions; that is, the overlap of frequencies using uniform Gaussians may, at certain points in the spectrum, either be insufficient to pick up relationships where they exist or mix up information to such an extent that significant correlations are obscured by noise. A genetic algorithm is used to search for optimal localised σ values using crossvalidated PLS regression scores as the fitness score to be optimised. The resultant models are then validated against a previously unseen test set of compounds. The performance of EVA_GA is compared to that of EVA and analogous CoMFA studies

    Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases

    Get PDF
    This paper reports an evaluation of both graph-based and fingerprint-based measures of structural similarity, when used for virtual screening of sets of 2D molecules drawn from the MDDR and ID Alert databases. The graph-based measures employ a new maximum common edge subgraph isomorphism algorithm, called RASCAL, with several similarity coefficients described previously for quantifying the similarity between pairs of graphs. The effectiveness of these graph-based searches is compared with that resulting from similarity searches using BCI, Daylight and Unity 2D fingerprints. Our results suggest that graph-based approaches provide an effective complement to existing fingerprint-based approaches to virtual screening
    • …
    corecore